CPU requests and limits for build pod #1348

noroutine · 2021-08-24T18:49:57Z

This PR makes CPU request and limit for build pod to be configurable in config, just like memory settings
Trait was implemented to be flexible in taken values and accepts integer, floats, string representations of those, as well as millicore spec as per https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu

welcome · 2021-08-24T18:49:59Z

Thanks for submitting your first pull request! You are awesome! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly.

You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

betatim · 2021-08-25T15:57:53Z

Thanks for this PR! I was surprised to see that we don't already have this, which means this is for sure useful :D

Looking and thinking about the code now.

Some immediate thoughts:

will this work when BinderHub is not running on kubernetes? Recently @manics and/or @yuvipanda did some work on making that possible. Would be good to know how this interacts. If it would require extra work it doesn't need to be addressed here (can be a new PR) xref Allow running dev setup without kubernetes #1323
do we need to have a new traitlet type? The reason I'm asking is because that adds a lot of code for which we'd need some tests, etc. We did the same for the memory spec, so we probably need it. But who knows maybe we got smarter since then and now no longer need it?
there are no tests :( can we copy from/be inspired by the memory spec test suite?

manics · 2021-08-25T17:45:53Z

I think it'll be fine for non-k8s systems, we can ignore the limits 😄.

However if we want to make it easier to implement limits in non-k8s backends it might be nice to tighten the specification.
e.g. CPU is currently "Can be an integer (1), fraction (0.5) or millicore specification (100m)", whereas we could choose to only allow only of those (either 0-1 or millicore).

noroutine · 2021-08-25T19:51:11Z

@betatim, @manics, thanks for feedback

Tbh I was confused about units in JH/BH, seems like different approach from k8s is used (M, G, K instead of Mi, Gi, Ki), so I was not sure what is "right" here. Due to time pressure, just ended with k8s convention, I'm open to adjust. On one hand currently it feels unnatural to use convention different from k8s, otoh in this case probably is bad idea to use convention that differs from rest of JH ecosystem.

My honest feeling is the whole idea of having these traits for extra validation/translation for different convention breaks least surprise principle, and is if not already a source of some confusion.

If i'm an operator and deploy it to k8s, expectation is that requests and limits follow k8s conventions and be passed verbatim as specified in config to the pod spec, rather than being transformed via traits to some magic unexpected common denominators. Applies to JH as well.

As for tests, I'd be more than glad to write some, as this PR is driven by some internal requirements and we're interested in merging it to upstream to avoid the cost of maintaining fork. Will raise this tomorrow internally.

manics · 2021-08-25T20:07:23Z

@noroutine I understand your point! There's no single right answer, some people expect things to follow a standard format that is independent of the underlying implementation, some expect it to follow a low level standard (K8s spec), others expect it to follow a different low-level standard (e.g. systemd uses percentages).

I think the decision should come down to whether we'd like to support limits in future alternative backends. If we decide only K8s matters then using the K8s format is fine, otherwise we should consider how other developers could implement it.

Regardless of the format I think it'd be nice to include a description of it in the help as well as a URL. That guarantees it's always known, even if the URL changes in future.

betatim · 2021-08-26T08:23:25Z

The BytesSpecification comes from JupyterHub which runs on many platforms, not just kubernetes. I think that helps explain why it isn't "just like k8s". BinderHub re-used it probably because it was easy to reuse it (even though back in the day there was no plan for BH to ever work without k8s.) and because it is consistent within the Jupyter(Hub) eco-system.

For CPU limits: the ones implemented in this PR follow the format kubernetes uses as well as what the docker CLI "natively" understands. This is also the format repo2docker uses.

So unless someone has strong opinions on an alternative format I'd say we are consistent within our ecosystem and with kubernetes. So lets stick with this for now?

It would be great to have some tests, even just simple ones that check that the basics aren't broken. Without this something will sooner or later break this feature and no one will notice.

noroutine · 2021-08-28T19:04:39Z

Hey guys

I've just pushed some tests to cover that trait, let me know if i can do any more from my side to get this through!

Kind regards

consideRatio

LGTM, nice!

consideRatio · 2021-09-22T17:21:01Z

@manics @betatim tests has been added and the code looks good to me.

I also think that the config is reasonable in a non-k8s context, it could be relevant to have such requests/limits there as well even though maybe it doesn't - and then it can be documented they will be ignored for such environments later.

Go for merge?

binderhub/utils.py

betatim · 2021-09-24T07:43:42Z

Not sure why the tests are failing :(

manics · 2021-09-24T17:47:32Z

If you go to the logs for the failing check: https://github.com/jupyterhub/binderhub/pull/1348/checks?check_run_id=3697325231
and expand Show hub logs you'll see the Hub errors:

[E 210924 07:41:15 web:1789] Uncaught exception GET /services/binder/build/gh/binderhub-ci-repos/cached-minimal-dockerfile/HEAD (::1)
    HTTPServerRequest(protocol='http', host='localhost:8000', method='GET', uri='/services/binder/build/gh/binderhub-ci-repos/cached-minimal-dockerfile/HEAD', version='HTTP/1.1', remote_ip='::1')
    Traceback (most recent call last):
      File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/tornado/web.py", line 1704, in _execute
        result = await result
      File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/binderhub/builder.py", line 402, in get
        self.build = build = BuildClass(
    TypeError: __init__() got an unexpected keyword argument 'cpu_limit'

Work on #1318 went a lot faster than expected! The new parameter needs to be added to

binderhub/binderhub/build_local.py

Lines 65 to 93 in a8e3444

    
           class LocalRepo2dockerBuild(Build): 
        
               """Represents a build of a git repository into a docker image. 
        
               This runs a build using the repo2docker command line tool. 
        
               WARNING: This is still under development. Breaking changes may be made at any time. 
        
               """ 
        
               def __init__( 
        
                   self, 
        
                   q, 
        
                   api, 
        
                   name, 
        
                   *, 
        
                   namespace, 
        
                   repo_url, 
        
                   ref, 
        
                   build_image, 
        
                   docker_host, 
        
                   image_name, 
        
                   git_credentials=None, 
        
                   push_secret=None, 
        
                   memory_limit=0, 
        
                   memory_request=0, 
        
                   node_selector=None, 
        
                   appendix="", 
        
                   log_tail_lines=100, 
        
                   sticky_builds=False, 
        
               ):

The alternative is to add this as a configurable Trailets on the Build class when it's refactored as discussed in #1318 (comment). That way you can avoid adding a parameter to BinderHub, the configurable should be passed down to the class, and since it can be a configurable on e.g. KubernetesBuilder it can use Kubernetes specific config. The downside is that'll delay the addition of this new feature.

However there's another issue. The build pod runs repo2docker, which does the build in another Docker container (by communicating directly with the Docker daemon through a socket). Since the Docker build takes place outside K8S the K8S pod limits won't apply, they somehow need to be passed to repo2docker.

Based on this comment:

binderhub/binderhub/build.py

Lines 132 to 136 in a8e3444

    
                   memory_request 
        
                       Memory request of the build pod. The actual building happens in the 
        
                       docker daemon, but setting request in the build pod makes sure that 
        
                       memory is reserved for the docker build in the node by the kubernetes 
        
                       scheduler.

a memory request is useful because it guarantees a certain amount of memory for the build pod. If I understand correctly the build pod won't use that memory so most of it will be free on the K8S node, which means the Docker container where the build occurs can use it. memory_limit has an effect because it's passed down to repo2docker, which then sets the memory limit using the Docker build API:

binderhub/binderhub/build.py

Lines 193 to 195 in a8e3444

    
           if self.memory_limit: 
        
               r2d_options.append('--build-memory-limit') 
        
               r2d_options.append(str(self.memory_limit))

https://github.com/jupyterhub/repo2docker/blob/3eef69f7454ddb9086af38b2ece5f632af05920f/repo2docker/app.py#L161-L169

This means cpu_request will effectively reserve some CPU for the build, but cpu_limit will have no effect unless it's also added to repo2docker.

noroutine · 2021-11-03T10:13:59Z

Hi gentlemen

Unfortunate it failed again.

I was a bit out on leave, and checking now above, trying to understand what best action on making this work again.
Will dive into this again this week, but have to understand what shall I do regarding repo2docker code

Thank you

manics · 2021-11-03T11:19:37Z

@noroutine You'll need to add cpu limit support to repo2docker, then you can come back here and pass the appropriate flags when repo2docker is run.

noroutine · 2021-11-03T19:18:16Z

Thank you @manics

CPU requests and limits for build pod

4a511f1

noroutine force-pushed the build-cpu-requests branch from 0395ca4 to 4a511f1 Compare August 28, 2021 18:55

consideRatio approved these changes Aug 28, 2021

View reviewed changes

betatim reviewed Sep 24, 2021

View reviewed changes

binderhub/utils.py Outdated Show resolved Hide resolved

Update binderhub/utils.py

618c289

betatim approved these changes Sep 24, 2021

View reviewed changes

consideRatio added code:python Python changes. enhancement labels Oct 10, 2021

manics marked this pull request as draft February 12, 2022 15:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU requests and limits for build pod #1348

CPU requests and limits for build pod #1348

noroutine commented Aug 24, 2021 •

edited

Loading

welcome bot commented Aug 24, 2021

betatim commented Aug 25, 2021 •

edited

Loading

manics commented Aug 25, 2021

noroutine commented Aug 25, 2021 •

edited

Loading

manics commented Aug 25, 2021

betatim commented Aug 26, 2021

noroutine commented Aug 28, 2021

consideRatio left a comment

consideRatio commented Sep 22, 2021

betatim commented Sep 24, 2021

manics commented Sep 24, 2021

noroutine commented Nov 3, 2021

manics commented Nov 3, 2021

noroutine commented Nov 3, 2021

CPU requests and limits for build pod #1348

Are you sure you want to change the base?

CPU requests and limits for build pod #1348

Conversation

noroutine commented Aug 24, 2021 • edited Loading

welcome bot commented Aug 24, 2021

betatim commented Aug 25, 2021 • edited Loading

manics commented Aug 25, 2021

noroutine commented Aug 25, 2021 • edited Loading

manics commented Aug 25, 2021

betatim commented Aug 26, 2021

noroutine commented Aug 28, 2021

consideRatio left a comment

Choose a reason for hiding this comment

consideRatio commented Sep 22, 2021

betatim commented Sep 24, 2021

manics commented Sep 24, 2021

noroutine commented Nov 3, 2021

manics commented Nov 3, 2021

noroutine commented Nov 3, 2021

noroutine commented Aug 24, 2021 •

edited

Loading

betatim commented Aug 25, 2021 •

edited

Loading

noroutine commented Aug 25, 2021 •

edited

Loading